2016 SEG Machine Learning Contest
This notebook analyzes the relationship between the Nonmarine vs. Marine log indicator (NM_M in the dataset) and its impact on facies. The idea is that different facies should have nearly all observations belonging to either the nonmarine class, or the marine class. If this is the case, it may be prudent to develop two separate classifiers split on the Nonmarine vs. Marine indicator. This may help reduce misclassification (observations with a nonmarine indicator may only be classified as a nonmarine facies, likewise for marine). Similarly, the machine learning algorithm may have already "learned" this - in which case the results will be equal or worse.
After loading the data, I will examine the distribution of the NM_M feature across each facies. I will then determine which facies may be considered nonmarine origin, and which facies marine origin. I will then apply the Support Vector Machine algorithm to A - the entire dataset (as with jpoirier001.ipynb), and B - nonmarine and marine separately to assess any improvement in predictive accuracy. Here, accuracy will be measured by the average F1 score across all classes when applied to the Newby well - which will be considered a blind test and not included when training the model.
First, let's load packages required to perform our analysis.
In [2]:
# visualization packages
library(repr)
library(ggplot2)
library(ggthemes)
library(cowplot)
# machine learning packages
library(caret)
library(e1071)
In [3]:
# load data
fname <- "../facies_vectors.csv"
data <- read.csv(fname, colClasses=c(rep("factor",3), rep("numeric",6), "factor", "numeric"))
# display first five rows of data set and it's dimensions
head(data)
paste(dim(data)[1], "rows x", dim(data)[2], "columns")
In [4]:
# training and validation maybe?
blind <- data[data$Well.Name == 'NEWBY',]
data <- data[data$Well.Name != 'NEWBY',]
This data is from the Panoma Council Grove Field (predominantly gas reservoir) over 2700 sq mi in SW Kansas. The dataset is from nine wells with 4149 samples. Each sample consists of seven predictor variables and a rock facies. The validation (test) data have 830 samples from two wells having the same seven predictor variables. Facies are based on examination of cores from the nine wells taken vertically at half-foot intervals. The predictor variables include five wireline log measurements and two geologic constraining variables that are derived from geologic knowledge and are sampled at the same half-foot rate.
The seven predictor variables are:
The two geologic constraining variables are:
The nine discrete facies (classes of rock) are:
These facies are not discrete and can gradually blend into one another. Some have neighboring facies that are rather close. Mislabeling within these neighboring facies can be expected to occur. The following table lists the facies, their abbreviated labels, and their approximate neighbors.
Facies | Label | Adjacent Facies |
---|---|---|
1 | SS | 2 |
2 | CSiS | 1,3 |
3 | FSiS | 2 |
4 | SiSh | 5 |
5 | MS | 4,6 |
6 | WS | 5,7 |
7 | D | 6,8 |
8 | PS | 6,7,9 |
9 | BS | 7,8 |
Now let's define a colormap for the facies such that they are represented by consistent colors in this tutorial. We'll also take a peek at the statistical distribution of the input variables.
In [5]:
# 1=sandstone, 2=c_siltstone, 3=f_siltstone, 4=marine_silt_shale, 5=mudstone,
# 6=wackestone, 7=dolomite, 8=packestone, 9=bafflestone
facies_colors <- c('#F4D03F', '#F5B041', '#DC7633', '#6E2C00', '#1B4F72', '#2E86C1', '#AED6F1', '#A569BD', '#196F3D')
facies_labels <- c('SS', 'CSiS', 'FSiS', 'SiSh', 'MS', 'WS', 'D', 'PS', 'BS')
summary(data)
Looking at the statistical summary of the input variables, it can be seen that all but the PE (photoelectric effect) inputs have no NA's listed. For this tutorial, we will drop the feature vectors that do not have a valid PE entry.
In [6]:
PE_mask <- complete.cases(data)
data <- data[PE_mask,]
paste(dim(data)[1], "rows x", dim(data)[2], "columns")
Out of the original 4149 samples, we will be training our model on 2769 samples. Now let's build some familiar log plots!
In [7]:
logplot <- function(x, incl_fac=TRUE, incl_pred=FALSE) {
# GR gamma ray track
g1 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_continuous(lim=c(0,400), breaks=seq(0,400,100), labels=c("0"="0","100"="","200"="200","300"="","400"="400")) +
scale_x_continuous(trans="reverse") + coord_flip() + labs(title="", x="Depth", y="GR") +
geom_bar(stat="identity", data=x, aes(x=Depth, y=GR, fill=GR, alpha=0.5), width=0.5) +
geom_line(aes(x=Depth, y=GR), lwd=.5, col='black') +
scale_fill_continuous(limits=c(0,225), low="yellow", high="black") +
theme(panel.grid.major.x = element_line(colour="gray", size=0.5), legend.position="none",
axis.text=element_text(size=6), axis.title=element_text(size=8,face="bold"))
g1 <- switch_axis_position(g1, 'x')
# ILD resistivity track (transform it back to actual units)
g2 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_log10(lim=c(0.1,50), breaks=c(.1,.2,.4,.6,.8,1,2,4,6,8,10,20,40),
labels=c(".1"=".1",".2"="",".4"="",".6"="",".8"="",
"1"="1","2"="","4"="","6"="","8"="","10"="10",
"20"="","40"="")) +
scale_x_continuous(trans="reverse") +
coord_flip() + labs(title="", x="", y="ILD") +
geom_line(aes(x=Depth, y=10^ILD_log10), lwd=.5, col="skyblue4") +
theme(panel.grid.major.x = element_line(colour="gray", size=0.25), legend.position="none",
axis.text=element_text(size=6), axis.title=element_text(size=8,face="bold"), axis.text.y=element_blank())
g2 <- switch_axis_position(g2, 'x')
# DeltaPhi track
g3 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_continuous(lim=c(-20,20), breaks=seq(-20,20,10),labels=c("-20"="-20","-10"="","0"="0","10"="","20"="20")) +
scale_x_continuous(trans="reverse") + coord_flip() + labs(title="", x="", y="DeltaPhi") +
geom_line(aes(x=Depth, y=DeltaPHI), lwd=.5, col="seagreen4") +
theme(panel.grid.major.x = element_line(colour="gray", size=0.25), legend.position="none",
axis.text=element_text(size=6), axis.title=element_text(size=8,face="bold"), axis.text.y=element_blank())
g3 <- switch_axis_position(g3, 'x')
# PHIND neutron porosity track
g4 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_continuous(lim=c(0,50), breaks=c(0,15,30,45)) + scale_x_continuous(trans="reverse") +
coord_flip() + labs(title="", x="", y="PHIND") +
geom_line(aes(x=Depth, y=PHIND), lwd=.5, col="firebrick") +
theme(panel.grid.major.x = element_line(colour="gray", size=0.25), legend.position="none",
axis.text=element_text(size=6), axis.title=element_text(size=8,face="bold"), axis.text.y=element_blank())
g4 <- switch_axis_position(g4, 'x')
# PE photoelectric effect track
g5 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_continuous(lim=c(0,8), breaks=c(0,2,4,6,8)) + scale_x_continuous(trans="reverse") +
coord_flip() + labs(title="", x="", y="PE") +
geom_line(aes(x=Depth, y=PE), lwd=.5, col="black") +
theme(panel.grid.major.x = element_line(colour="gray", size=0.25), legend.position="none",
axis.text=element_text(size=6), axis.title=element_text(size=8,face="bold"), axis.text.y=element_blank())
g5 <- switch_axis_position(g5, 'x')
x$ones <- rep(1, nrow(x))
# build a facies track if we are to include
if (incl_fac) {
g6 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_continuous(lim=c(-0.1,1.1), breaks=c(0,1), labels=c("0"="", "1"="")) + scale_x_continuous(trans="reverse") +
coord_flip() + labs(title="", x="", y="Facies") +
geom_bar(stat="identity", data=x, aes(x=Depth, y=ones, fill=Facies), width=0.5) +
scale_fill_manual(values=facies_colors, drop=F, labels=facies_labels) +
theme(axis.title=element_text(size=8,face="bold"), axis.text.y=element_blank(), axis.text.x=element_text(size=6))
}
# build a prediction track if we are to include
if (incl_pred) {
# build Predicted Facies track
g7 <- ggplot(x) + theme_economist_white(gray_bg=T) +
scale_y_continuous(lim=c(-0.1,1.1), breaks=c(0,1), labels=c("0"="", "1"="")) + scale_x_continuous(trans="reverse") +
coord_flip() + labs(title="", x="", y="Predicted") +
geom_bar(stat="identity", data=x, aes(x=Depth, y=ones, fill=Predicted), width=0.5) +
scale_fill_manual(values=facies_colors, drop=F, labels=facies_labels) +
theme(legend.position="right", legend.text=element_text(size=6), legend.title=element_blank()) +
theme(axis.title=element_text(size=8,face="bold"), axis.text.y=element_blank(), axis.text.x=element_text(size=6))
g7 <- switch_axis_position(g7, 'x')
# finish off Facies track with no legend if we are to include
if (incl_fac) {
g6 <- g6 + theme(legend.position="none")
g6 <- switch_axis_position(g6, 'x')
# bring all the tracks together as a grid
g <- plot_grid(g1, g2, g3, g4, g5, g6, g7, ncol=7, rel_widths=c(4,3,3,3,3,2,5))
}
else {
# bring all the tracks together as a grid
g <- plot_grid(g1, g2, g3, g4, g5, g7, ncol=6, rel_widths=c(4,3,3,3,3,5))
}
ggdraw() + draw_plot(g, width=1, height=1) + draw_plot_label(x$Well.Name[1], size=10)
}
else {
if (incl_fac) {
# finish off Facies track with a legend
g6 <- g6 + theme(legend.position="right", legend.text=element_text(size=6), legend.title=element_blank())
g6 <- switch_axis_position(g6, 'x')
# bring all the tracks together as a grid
g <- plot_grid(g1, g2, g3, g4, g5, g6, ncol=6, rel_widths=c(4,3,3,3,3,6))
}
else {
# bring all the tracks together as a grid
g <- plot_grid(g1, g2, g3, g4, g5, ncol=5, rel_widths=c(4,3,3,3,3))
}
ggdraw() + draw_plot(g, width=1, height=1) + draw_plot_label(x$Well.Name[1], size=10)
}
}
In [8]:
options(repr.plot.width=8, repr.plot.height=5)
# plot logs for the Shrimplin and Shankle wells
logplot(data[data$Well.Name == "SHRIMPLIN",])
logplot(data[data$Well.Name == "SHANKLE",])
In [9]:
options(repr.plot.width=8, repr.plot.height=4)
# modify the NM_M factors to be a string - more descriptive and plots nicer
levels(data$NM_M)[levels(data$NM_M)=="1"] <- "Nonmarine"
levels(data$NM_M)[levels(data$NM_M)=="2"] <- "Marine"
# build histogram faceted on the NM_M (nonmarine vs marine) feature
g <- ggplot(data, aes(x=Facies)) + theme_economist_white(gray_bg=T) +
facet_grid(. ~ NM_M) +
geom_bar(aes(x=Facies, fill=Facies)) + labs(title="Distribution of Facies", x="", y="") +
scale_x_discrete(labels=facies_labels) +
scale_fill_manual(values=facies_colors, drop=F, labels=facies_labels) +
theme(legend.position="none", legend.title=element_blank(), legend.text=element_text(size=6),
axis.text=element_text(size=6), plot.title=element_text(size=10), axis.title=element_blank(),
axis.ticks.x=element_blank())
g
That's a pretty interesting visual. The SS, CSiS, and FSiS facies appear to be nonmarine; while the remaining facies appear to be marine. There do seem to be a handful of cross-classifications - let's make a table to quantify the percentage of observations not following this trend.
In [10]:
# modify the factor levels for facies to be more descriptive
levels(data$Facies)[levels(data$Facies)=="1"] <- "SS"
levels(data$Facies)[levels(data$Facies)=="2"] <- "CSiS"
levels(data$Facies)[levels(data$Facies)=="3"] <- "FSiS"
levels(data$Facies)[levels(data$Facies)=="4"] <- "SiSh"
levels(data$Facies)[levels(data$Facies)=="5"] <- "MS"
levels(data$Facies)[levels(data$Facies)=="6"] <- "WS"
levels(data$Facies)[levels(data$Facies)=="7"] <- "D"
levels(data$Facies)[levels(data$Facies)=="8"] <- "PS"
levels(data$Facies)[levels(data$Facies)=="9"] <- "BS"
# count observations of facies which are nonmarine or marine
t <- table(data$Facies, data$NM_M == "Marine")
# calculate those counts as percentages
nm_percent <- round(100 * t[,1] / (t[,1] + t[,2]),0)
m_percent <- round(100 * t[,2] / (t[,1] + t[,2]),0)
t <- as.table(cbind(t[,1], t[,2], nm_percent, m_percent))
# format the table and output
dimnames(t)[[2]] <- c('# Nonmarine', '# Marine', '% Nonmarine', '% Marine')
t
MS - Mudstone is the facies with the most uncertainty here. Of the 189 observations of the MS facies, 11 (6%) were considered nonmarine. It's probably safe to call SS, CSiS, and FSiS all nonmarine, and the remaining facies marine. How many of the observations overall do not fall into this ordering of the data?
In [11]:
# sum up how many correctly/incorrectly fall under our "rule"
nm_m_true <- t[1,1] + t[2,1] + t[3,1] + t[4,2] + t[5,2] + t[6,2] + t[7,2] + t[8,2] + t[9,2]
nm_m_false <- t[1,2] + t[2,2] + t[3,2] + t[4,1] + t[5,1] + t[6,1] + t[7,1] + t[8,1] + t[9,1]
# percentage of observations not falling under our "rule*
paste(round(100 * nm_m_false / (nm_m_true + nm_m_false),2), "%")
Less than 2% of observations fail our rule. Let's move forward to applying some Support Vector Machines to our data!
As in the previous notebook jpoirier001.ipynb, let's build a Support Vector Machine classifier. First, we'll build a classifier and tune the results. Second, we'll build two classifiers (one for nonmarine indicators, one for marine).
This workflow will be very similar to that of jpoirier001.ipynb. Ultimately we expect the same results.
In [12]:
## HELPER FUNCTIONS
## ################################################################################################
# list of adjacent facies
adjacent_facies <- list(as.list(c(2)), as.list(c(1,3)),
as.list(c(2)), as.list(c(5)),
as.list(c(4,6)), as.list(c(5,7,8)),
as.list(c(6,8)), as.list(c(6,7,9)),
as.list(c(7,8)))
# function to calculate the confusion matrix which includes adjacent facies as correctly classified
adj_cm_table <- function(cm_table) {
adj_cm_table <- cm_table
# loop through facies to build adjacent facies confusion matrix
for (i in 1:9) {
cor <- cm_table[i,i]
# move adjacently correct facies into correct facies
for (j in 1:length(adjacent_facies[[i]])) {
cor <- cor + cm_table[adjacent_facies[[i]][[j]],i]
adj_cm_table[adjacent_facies[[i]][[j]],i] <- 0
}
adj_cm_table[i,i] <- cor
}
# return adjacently corrected confusion matrix
adj_cm_table
}
# function to display a confusion matrix
# replaces the integer facies with facies shortnames first
disp_cm <- function(cm) {
dimnames(cm)$Prediction <- facies_labels
dimnames(cm)$Reference <- facies_labels
print(cm)
}
# cm - confusion matrix either as a confusionMatrix object or table object
# x - the data used, from this we calculate how many observations (support) for each facies
accuracy_metrics <- function(cm, x) {
# if given confusion matrix (cm) is a confusionMatrix object
if (class(cm) == "confusionMatrix") {
df <- data.frame("Facies" = facies_labels,
"Precision" = cm[["byClass"]][,5],
"Recall" = cm[["byClass"]][,6],
"F1" = cm[["byClass"]][,7],
"Support" = as.matrix(table(x$Facies)))
df[,-1] <- round(df[,-1],2)
rownames(df) <- NULL
}
# if given confusion matrix is a table object
else if (class(cm) == "table") {
# initialize vectors for precision, recall, and f1 metrics with zeros
prec <- rep(0,9)
recall <- rep(0,9)
f1 <- rep(0,9)
# loop through facies to compute precision, recall, and f1 for each facies
beta <- 1
for (i in 1:9) {
prec[i] <- cm[i,i] / sum(cm[i,])
recall[i] <- cm[i,i] / sum(cm[,i])
f1[i] <- (1 + beta^2) * prec[i] * recall[i] / ((beta^2 * prec[i]) + recall[i])
}
# calculate model metrics for precision, recall, and f1 and output
df <- data.frame(Facies=facies_labels,
Precision=prec,
Recall=recall,
F1=f1,
Support=as.matrix(table(x$Facies)))
# round values to two digits
df[,-1] <- round(df[,-1],2)
}
# average F1 score across all classes
print(paste0("Overall F1-score of: ", round(mean(df$F1, na.rm=T),2)))
print("Accuracy metrics:")
df
}
In [13]:
set.seed(1234)
# split into training and test data sets
feature_vectors <- data[, c("Facies", "GR", "ILD_log10", "DeltaPHI", "PHIND", "PE", "NM_M", "RELPOS")]
trainIndex <- createDataPartition(feature_vectors$Facies, p=.8, list=F, times=1)
x_train <- feature_vectors[trainIndex,]
x_test <- feature_vectors[-trainIndex,]
set.seed(3124)
# calculates models for a variety of hyperparameters - this will take a few minutes
tune.out <- tune(svm, Facies ~ ., data=x_train,
kernel="radial",
ranges=list(cost=c(.01,1,5,10,20,50,100,1000,5000,10000),
gamma=c(.0001,.001,.01,1,10)))
# predict facies using the best model
cv_predictions <- predict(tune.out$best.model, newdata=x_test)
## PART ONE: Confusion matrix for facies classification
"Facies classification confusion matrix:"
cv_cm <- confusionMatrix(cv_predictions, x_test$Facies)
disp_cm(as.matrix(cv_cm[["table"]]))
## PART TWO: Confusion matrix for adjacent facies classification
"Adjacent facies classification confusion matrix:"
cv_adj <- adj_cm_table(as.matrix(cv_cm[["table"]]))
disp_cm(cv_adj)
Now let's apply the model to our blind data set an output the confusion matrices and accuracy metrics!
In [14]:
# modify the factor levels for facies to be more descriptive
levels(blind$Facies)[levels(blind$Facies)=="1"] <- "SS"
levels(blind$Facies)[levels(blind$Facies)=="2"] <- "CSiS"
levels(blind$Facies)[levels(blind$Facies)=="3"] <- "FSiS"
levels(blind$Facies)[levels(blind$Facies)=="4"] <- "SiSh"
levels(blind$Facies)[levels(blind$Facies)=="5"] <- "MS"
levels(blind$Facies)[levels(blind$Facies)=="6"] <- "WS"
levels(blind$Facies)[levels(blind$Facies)=="7"] <- "D"
levels(blind$Facies)[levels(blind$Facies)=="8"] <- "PS"
levels(blind$Facies)[levels(blind$Facies)=="9"] <- "BS"
"PART ONE: Facies classification"
blind_predictions <- predict(tune.out$best.model, newdata=blind[,c(-2,-3,-4)])
blind_cm <- confusionMatrix(blind_predictions, blind$Facies)
disp_cm(as.matrix(blind_cm[["table"]]))
accuracy_metrics(blind_cm, blind)
"PART TWO: Adjacent facies classification"
"Facies classification accuracy metrics"
blind_adj <- adj_cm_table(as.matrix(blind_cm[["table"]]))
disp_cm(blind_adj)
accuracy_metrics(cv_cm, x_test)
These results are consistent with what we observed in jpoirier001.ipynb - an overall F1 score of 0.4 and 0.77 for Facies and Adjacent Facies classification respectively.
Now let's subset the training data to only include observations with a nonmarine indicator. From this, I will build a support vector machine model. We will perform cross-validation again to see if the optimal model parameters are different for this subset of data. We'll wait to do the adjacent facies classification accuracy analysis once we bring these classifications back together with the marine classifications.
In [15]:
# subset data - drop "marine" levels and also no longer train using NM_M channel since we only have one value
nm_feature_vectors <- droplevels(feature_vectors[feature_vectors$NM_M == "Nonmarine" &
feature_vectors$Facies %in% c("SS", "CSiS", "FSiS"),-7])
# split into training and cross-validation
set.seed(3124)
nm_trainIndex <- createDataPartition(nm_feature_vectors$Facies, p=.8, list=F, times=1)
nm_x_train <- nm_feature_vectors[nm_trainIndex,]
nm_x_test <- nm_feature_vectors[-nm_trainIndex,]
# calculates models for a variety of hyperparameters - this will take a few minutes
nm_tune.out <- tune(svm, Facies ~ ., data=nm_x_train,
kernel="radial",
ranges=list(cost=c(.01,1,5,10,20,50,100,1000,5000,10000),
gamma=c(.0001,.001,.01,1,10)))
summary(nm_tune.out)
# predict facies using the best model
nm_cv_predictions <- predict(nm_tune.out$best.model, newdata=nm_x_test)
## PART ONE: Confusion matrix for facies classification
"Facies classification confusion matrix:"
nm_cv_cm <- confusionMatrix(nm_cv_predictions, nm_x_test$Facies)
print(as.matrix(nm_cv_cm[["table"]]))
If we recall jpoirier001.ipynb, the cross-validation showed best parameters of cost 10 and gamma 1. We observe here best parameters of cost 5 and gamma 1 (although, the error levels for each are very close). It is noteworthy that our best performance has improved from 0.2475297 in jpoirier001.ipynb to 0.1908166 for nonmarine classifications. What about marine?
Now let's subset the training data to only include observations with a marine indicator. From this, I will build a support vector machine model. We will perform cross-validation again to see if the optimal model parameters are different for this subset of data.
In [16]:
# subset data - drop "nonmarine" levels and also no longer train using NM_M channel since we only have one value
m_feature_vectors <- droplevels(feature_vectors[feature_vectors$NM_M == "Marine" &
feature_vectors$Facies %in% c("SiSh", "MS", "WS", "D", "PS", "BS"),-7])
# split into training and cross-validation
set.seed(3124)
m_trainIndex <- createDataPartition(m_feature_vectors$Facies, p=.8, list=F, times=1)
m_x_train <- m_feature_vectors[m_trainIndex,]
m_x_test <- m_feature_vectors[-m_trainIndex,]
# calculates models for a variety of hyperparameters - this will take a few minutes
m_tune.out <- tune(svm, Facies ~ ., data=m_x_train,
kernel="radial",
ranges=list(cost=c(.01,1,5,10,20,50,100,1000,5000,10000),
gamma=c(.0001,.001,.01,1,10)))
summary(m_tune.out)
# predict facies using the best model
m_cv_predictions <- predict(m_tune.out$best.model, newdata=m_x_test)
## PART ONE: Confusion matrix for facies classification
"Facies classification confusion matrix:"
m_cv_cm <- confusionMatrix(m_cv_predictions, m_x_test$Facies)
print(as.matrix(m_cv_cm[["table"]]))
For marine facies, our error level has actually gone up to 0.2506868. Realistically, some k-folds cross-validation here could give us a better picture of error levels. But for now, let's see what happens when we bring it all together!
Now let's apply the cross-validated models to the blind well data (Newby) and evaluate the results against our initial model built using all data at once.
In [65]:
# modify the NM_M factors to be a string - more descriptive and plots nicer
levels(blind$NM_M)[levels(blind$NM_M)=="1"] <- "Nonmarine"
levels(blind$NM_M)[levels(blind$NM_M)=="2"] <- "Marine"
# subset data - drop "nonmarine" levels and also no longer train using NM_M channel since we only have one value
nm_blind <- blind[blind$NM_M == "Nonmarine",]
m_blind <- blind[blind$NM_M == "Marine",]
# predict facies using the best models
nm_blind_predictions <- predict(nm_tune.out$best.model, newdata=nm_blind)
m_blind_predictions <- predict(m_tune.out$best.model, newdata=m_blind)
# combind the confusion matrixes for the nonmarine and marine predictions
nm_blind_cm <- as.matrix(confusionMatrix(nm_blind_predictions, nm_blind$Facies)[["table"]])
m_blind_cm <- as.matrix(confusionMatrix(m_blind_predictions, m_blind$Facies)[["table"]])
nmm_blind_cm <- cv_adj
nmm_blind_cm[1:9, 1:9] <- 0
nmm_blind_cm[1:3, 1:3] <- nm_blind_cm[1:3, 1:3]
nmm_blind_cm[4:9, 4:9] <- m_blind_cm[4:9, 4:9]
"PART ONE: Facies classification"
"Facies classification accuracy metrics"
nmm_blind_cm
accuracy_metrics(nmm_blind_cm, blind)
"PART TWO: Adjacent facies classification"
"Facies classification accuracy metrics"
nmm_blind_adj <- adj_cm_table(nmm_blind_cm)
disp_cm(nmm_blind_adj)
accuracy_metrics(nmm_blind_adj, blind)
While the overall F1-score of the adjacent facies classification increased from 0.77 to 0.83; the facies classification problem itself stayed the same at 0.4. The SVM algorithm likely identified this strong relationship between nonmarine/marine indicator and facies - so there was little we could do to improve it by manually forcing it. So let's try something else next.